The influence of the inactives subset generation on the performance of machine learning methods
نویسندگان
چکیده
BACKGROUND A growing popularity of machine learning methods application in virtual screening, in both classification and regression tasks, can be observed in the past few years. However, their effectiveness is strongly dependent on many different factors. RESULTS In this study, the influence of the way of forming the set of inactives on the classification process was examined: random and diverse selection from the ZINC database, MDDR database and libraries generated according to the DUD methodology. All learning methods were tested in two modes: using one test set, the same for each method of inactive molecules generation and using test sets with inactives prepared in an analogous way as for training. The experiments were carried out for 5 different protein targets, 3 fingerprints for molecules representation and 7 classification algorithms with varying parameters. It appeared that the process of inactive set formation had a substantial impact on the machine learning methods performance. CONCLUSIONS The level of chemical space limitation determined the ability of tested classifiers to select potentially active molecules in virtual screening tasks, as for example DUDs (widely applied in docking experiments) did not provide proper selection of active molecules from databases with diverse structures. The study clearly showed that inactive compounds forming training set should be representative to the highest possible extent for libraries that undergo screening.
منابع مشابه
The influence of training actives/inactives ratio on machine learning performance
In drug discovery, machine learning is widely used to classify molecules as active or inactive against a particular target. The vast majority of these methods (supervised learning) needs a training set of objects (molecules) to develop a decision rule that can be used to classify new entities (the test set) into one of the two mentioned classes [1]. A lot of studies, searching an optimal learni...
متن کاملInvestigating the performance of machine learning-based methods in classroom reverberation time estimation using neural networks (Research Article)
Classrooms, as one of the most important educational environments, play a major role in the learning and academic progress of students. reverberation time, as one of the most important acoustic parameters inside rooms, has a significant effect on sound quality. The inefficiency of classical formulas such as Sabin, caused this article to examine the use of machine learning methods as an alternat...
متن کاملA Hybrid Machine Learning Method for Intrusion Detection
Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...
متن کاملThermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning
Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...
متن کاملA Hybrid Algorithm based on Deep Learning and Restricted Boltzmann Machine for Car Semantic Segmentation from Unmanned Aerial Vehicles (UAVs)-based Thermal Infrared Images
Nowadays, ground vehicle monitoring (GVM) is one of the areas of application in the intelligent traffic control system using image processing methods. In this context, the use of unmanned aerial vehicles based on thermal infrared (UAV-TIR) images is one of the optimal options for GVM due to the suitable spatial resolution, cost-effective and low volume of images. The methods that have been prop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 5 شماره
صفحات -
تاریخ انتشار 2013